OpenAI History and Principles

Analysis

OpenAI training and inference costs could reach $7bn for 2024, AI startup set to lose $5bn - report

equivalent of 350,000 servers containing Nvidia A100 chips for inference, with around 290,000 of those servers used for ChatGPT.

For both inference and training, OpenAI gets heavily discounted rates from Microsoft Azure. Microsoft has charged OpenAI about $1.30 per A100 server per hour, way below normal rates.

2023-11-08 4:37 PM

Stratechery on OpenAI Dev Conference

Professor Clayton Christensen’s theory of integration and modularity, wherein integration works better when a product isn’t good enough; it is only when a product exceeds expectation that there is room for standardization and modularity.

But consumer products are never good enough. Amazon’s customer obsession is a never-ending battle, because customers are constantly becoming more demanding. Same with ChatGPT, which has turned OpenAI into an “accidental consumer tech company”.

The result, Ben Thompson speculates, is that OpenAI may ultimately be forced to make its own hardware in order to remain competitive with platforms from Apple and Google that can satisfy the consumer’s never-ending need for integration. ## Governance

March 2024: Analysis of the OpenAI Board by Zvi says the new members are mostly unknowns in the world of AI, including former CEO of the Gates Foundation, a board member of Pfizer, and more.

“The Coup”

Zvi gives his detailed take on Lex Fridman’s interview with Sam Altman

Sam Altman’s Firing and Reinstatement

2023-11-27 7:44 AM

Substack The Pragmatic Engineer has a detailed timeline showing precisely how the “coup” happened What is OpenAI, Really? including good details of how the employee PPU (Profit Participation Units) compensation package basically works just like stock options to incentivize employees. And that 100x cap on profits is meaningless when you consider that even a company like Apple comes nowhere near that level of profit.

2023-11-21

Although ultimately it didn’t happen this way, one possible outcome of the “coup” was described by Ben Thompson OpenAI’s Misalignment and Microsoft’s Gain:

you can make the case that Microsoft just acquired OpenAI for $0 and zero risk of an antitrust lawsuit

Nov 18: another good step-by-step summary of the OpenAI debacle from thatwastheweek distinguishes between the “e/a lobby” with its pessimistic view of AI and the “e/acc lobby” which is less organized but optimistic.

2023-11-18

He was fired, suddenly, and apparently with cause. Another key member, a co-founder and former chairman quit at the same time, along with multiple employees.

In a blog post on Friday, the company said that Mr. Altman “was not consistently candid in his communications” with the board, but gave no other details.

My first thought is that, wow, this must be something extremely serious. Although some rumors claim it has to do with some fundamental philosophical differences – presumably about AI Safety – I have a hard time accepting that anyone would quit purely based on principle.

More likely, I think there’s some combination of financial malfeasance driven by greed, plus a “moral equivalence” self-deception. Maybe he/they were involved in some obvious illegal activity which they tried to justify under the guise that “this is for the good of mankind”, or “I’m doing so much good for the world, I deserve a couple of mistakes”.

Turns out my uninformed guesses were wrong. Instead, it appears that a few board members just didn’t like Altman’s style and thought it was their job to get rid of him. It wasn’t well thought-through, and the board members weren’t particularly well-qualified in the first place and so…they just did something stupid.

Technical Details

George Hotz says

Sam Altman is not going to tell you that GPT-4 is 240B parameters and is a 16-way mixture model with 8 sets of weights.

The Age of Giant Models is Over

Apr 2023 Statements

According to Wired Apr 2023 “the Age of Giant AI Models Is Already Over”, Altman says GPT-4 cost “a lot more” than $100M to train.

GPT-4 Leaked Details

GPT-4 Details Leaked

July-2023 The-Decoder

GPT-4’s Scale: GPT-4 has ~1.8 trillion parameters across 120 layers, which is over 10 times larger than GPT-3.
Mixture Of Experts (MoE): OpenAI utilizes 16 experts within their model, each with ~111B parameters for MLP. Two of these experts are routed per forward pass, which contributes to keeping costs manageable.
Dataset: GPT-4 is trained on ~13T tokens, including both text-based and code-based data, with some fine-tuning data from ScaleAI and internally.
Dataset Mixture: The training data included CommonCrawl & RefinedWeb, totaling 13T tokens. Speculation suggests additional sources like Twitter, Reddit, YouTube, and a large collection of textbooks.
Training Cost: The training costs for GPT-4 was around $63 million, taking into account the computational power required and the time of training.
Inference Cost: GPT-4 costs 3 times more than the 175B parameter Davinci, due to the larger clusters required and lower utilization rates.
Inference Architecture: The inference runs on a cluster of 128 GPUs, using 8-way tensor parallelism and 16-way pipeline parallelism.
Vision Multi-Modal: GPT-4 includes a vision encoder for autonomous agents to read web pages and transcribe images and videos. The architecture is similar to Flamingo. This adds more parameters on top and it is fine-tuned with another ~2 trillion tokens.

Uses a total of ~1.8 trillion parameters across 120 layers.

Mixture of experts (MoE) model

Uses 16 experts within their model, each about ~111B parameters for MLP. Two of these experts are routed to per forward pass.

MoE Routing:

While the literature talks a lot about advanced routing algorithms for choosing which experts to route each token to, OpenAI’s is allegedly quite simple, for the current GPT-4 model.

Use about ~55B shared parameters for attention.

Inference:

Each forward pass inference (generation of 1 token) uses only ~280B parameters and ~560 TFLOPs. This contrasts with the ~1.8 trillion parameters and ~3,700 TFLOP that would be required per forward pass of a purely dense model.

Dataset:

GPT-4 is trained on ~13T tokens.

These are not unique tokens, they count the epochs as more tokens as well.

Epoch number: 2 epochs for text-based data and 4 for code-based data.

There is millions of rows of instruction fine-tuning data from ScaleAI & internally.

GPT-4 32K

There was an 8k context length (seqlen) for the pre-training phase. The 32k seqlen version of GPT-4 is based on fine-tuning of the 8k after the pre-training.

Costs If their cost in the cloud was about $1 per A100 hour, the training costs for this run alone would be about $63 million.

(Today, the pre-training could be done with ~8,192 H100 in ~55 days for $21.5 million at $2 per H100 hour.)

HN Discussion

Other Technology

What is Q*? interconnects.ai tries to answer

People

In the podcast Lunar Society Dwarkesh Patel interviews Ilya Sutskever (OpenAI Chief Scientist) . I like the rapid-fire questioning style, though generally the answers seemed fairly bland. My takeaway is that Ilya thinks his success comes from a 2015 bet on deep learning and the subsequent hard work and focus he applied after that. OpenAI is a series of little things done well together, and he suspects the future improvements will be like that too, though you can’t rule out a significant breakthrough.

Lex Fridman and Sam Altman March 2024 no real breakthrough news except the standard obsession with “safety”. Altman says he wants the government to regulate this, but:

I realize that that means people like Marc Andreessen or whatever will claim I’m going for regulatory capture, and I’m just willing to be misunderstood there. It’s not true. And I think in the fullness of time, it’ll get proven out why this is important

@jbetker is an engineer who occasionally blogs at non-int. He thinks AGI will happen by 2027, thanks to improvements in embodiment, (system 2) reasoning, and world models that act like intelligent agents.

OpenAI in Media

Technology Review The messy, secretive reality behind OpenAI’s bid to save the world (evernote)

OpenAI has consistently sat almost exclusively on the scale-and-assemble end of the spectrum. Most of its breakthroughs have been the product of sinking dramatically greater computational resources into technical innovations developed in other labs.

“Foresight” runs experiments to test how far they can push AI, but they hid the results for six months.

Scaling Laws for Neural Language Models is an OpenAI paper that explains the empirical limits of large language models. Similar to another paper from MIT.

Independent teams handle different “bets”: language, robotics,

Diversity: of 120 employees, 25% are female (or “nonbinary)

links to a AI Now report tha says only 18% of AI authors are female